Latency Tolerant Branch Predictors
نویسندگان
چکیده
The access latency of branch predictors is a well known problem of fetch engine design. Prediction overriding techniques are commonly accepted to overcome this problem. However, prediction overriding requires a complex recovery mechanism to discard the wrong speculative work based on overridden predictions. In this paper, we show that stream and trace predictors, which use long basic prediction units, can tolerate access latency without needing overriding, thus reducing fetch engine complexity. We show that both the stream fetch engine and the trace cache architecture not using overriding outperform other efficient fetch engines, such as an EV8-like fetch architecture or the FTB fetch engine, even when they do use overriding.
منابع مشابه
Reconsidering Complex Branch Predictors
To sustain instruction throughput rates in more aggressively clocked microarchitectures, microarchitects have incorporated larger and more complex branch predictors into their designs, taking advantage of the increasing numbers of transistors available on a chip. Unfortunately, because of penalties associated with their implementations, the extra accuracy provided by many branch predictors does...
متن کاملCost-Effective Graceful Degradation in Speculative Processor Subsystems: The Branch Prediction Case
We analyze the effect of errors in branch predictors, a representative example of speculative processor subsystems, to motivate the necessity for fault tolerance in such subsystems. We also describe the design of fault tolerant branch predictors using general fault tolerance techniques. We then propose a fault-tolerant implementation that utilizes the Finite State Machine (FSM) structure of the...
متن کاملNeural Branch Prediction
The new neural predictor improves accuracy by combining path and pattern history to overcome limitation inherent to previous predictors. It uses a different prediction algorithm that would allow parallel execution of instructions during every prediction, thereby keeping the latency low. In fact, the fast path-based neural predictor has a latency comparable to the predictors from industrial desi...
متن کاملTolerating Branch Predictor Latency on SMT
Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with...
متن کاملOH-SNAP: Optimized Hybrid Scaled Neural Analog Predictor
Neural-based branch predictors have been among the most accurate in the literature. The recently proposed scaled neural analog predictor, or SNAP, builds on piecewise-linear branch prediction and relies on a mixed analog/digital implementation to mitigate latency as well as power requirements over previous neural predictors. I present an optimized version of the SNAP predictor, hybridized with ...
متن کامل